Principled Query Processing

نویسندگان

  • Jussi Karlgren
  • Magnus Sahlgren
  • Rickard Cöster
چکیده

This year, the SICS team decided to concentrate on query processing and on the internal topical structure of the query: we have identified this as one of the major bottlenecks for cross-lingual access systems. Previous years, the SICS team has investigated, among other issues, how to translate compounds. Compound translation is non-trivial due to dependencies between compound elements and has been treated in various ways in the treatment of compounding languages such as Swedish. We decided this year to investigate the topical dependencies between query terms, under the hypothesis that the complexity of translating compounds is a special case of the more general case of understanding the respective topicality of query terms. The question under investigation is how much each query term contributes in terms of topicality in the documents of the collection under consideration. If a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. Our base system is used with two different enhancements to test the hypothesis that boosting topically active terms is beneficial for retreival results. Both schemes are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms; the other using the likelihood of individual terms to appear topically in text. These are two different avenues of analysis and will most likely provide different results if pursued further than these initial experiments. The results of the boosting schemes delivered uncontroversially improved results. These results will provide impetus for the further study of translation of complex terms — the question which first prompted this set of experiments in the first place.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انتخاب مناسب‌ترین زبان پرس‌وجو برای استفاده از فرا‌‌پیوندها جهت استخراج داده‌ها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES

Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...

متن کامل

Poor Usability in Data Processing

The databases community works hard on the scale, performance, and correctness of the storage and query processing systems that our users depend on. Researchers are therefore frustrated to see less principled, and often incorrect 2010s implementations of concepts that were introduced in the 1970s. The lens of usability can help us understand how certain systems see adoption, regardless of the so...

متن کامل

Efficient Processing of Ad-Hoc Top-k Aggregate Queries in OLAP

In this paper, we develop a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries in OLAP. Such queries provide the k groups with the highest aggregates to decision makers. Essential support of top-k aggregate queries is lacking in current RDBMSs, which process such queries in a naı̈ve and overkill materialize-group-sort scheme, therefore can be prohibitively ...

متن کامل

Hierarchical Dirichlet Trees for Information Retrieval

We propose a principled probabilisitc framework which uses trees over the vocabulary to capture similarities among terms in an information retrieval setting. This allows the retrieval of documents based not just on occurrences of specific query terms, but also on similarities between terms (an effect similar to query expansion). Additionally our principled generative model exhibits an effect si...

متن کامل

FluXQuery: An Optimizing XQuery Processor for Streaming XML Data

XML has established itself as the ubiquitous format for data exchange on the Internet. An imminent development is that of streams of XML data being exchanged and queried. Data management scenarios where XQuery [11] is evaluated on XML streams are becoming increasingly important and realistic, e.g. in e-commerce settings. Naturally, query engines employed for stream processing are main-memory-ba...

متن کامل

Distributed Query Monitoring through Convex Analysis: Towards Composable Safe Zones

Continuous tracking of complex data analytics queries over high-speed distributed streams is becoming increasingly important. Query tracking can be reduced to continuous monitoring of a condition over the global stream. Communication-efficient monitoring relies on locally processing stream data at the sites where it is generated, by deriving site-local conditions which collectively guarantee th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005